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A  STOCHASTIC  RESPONSE  MODEL  WITH  APPLICATION 
TO  BRAND  CHOICE* 

David  B.  Montgomery 
Massachusetts  Institute  of  Technologv 

Abstract 

A  stochastic  response  model  for  the  binarv  response  situation  is  developed. 
The  general  model  is  specified  via  a  system  of  axioms  similar  in  nature  to  those 
used  in  stimulus  sampling  theory.   The  independent  elements  specification  of  the 
model  is  shown  to  be  unsatisfactory  as  a  response  model.   However,  the  cohesive 
elements  specification  is  shown  to  be  theoretically  viable.   In  its  infinite 
element  form  it  is  termed  a  probability  diffusion  model.   A  minimum  chi  square 
procedure  is  developed  for  estimating  and  testing  the  model.   In  an  empirical 
test  it  was  found  to  yield  an  excellent  fit  to  brand  choice  in  the  dentifrice 
market.   The  model  is  applicable  in  any  binary  choice  situation  in  which  a 
heterogeneous,  non-stationary  zero  order  model  seems  reasonable. 


*Much  of  the  work  reported  in  this  paper  is  from  my  doctoral  dissertation 
submitted  to  the  Graduate  School  of  Business,  Stanford  University,  in  May 
1966.   I  am  indebted  to  the  Richard  D.  Irwin  Foundation  and  to  Stanford 
University  for  financial  support.   I  am  especially  indebted  to 
Professor  William  F.  Massy,  the  chairman  of  my  thesis  committee. 


I,   THE  RESEARCH  CONTEXT 

Stochastic  models  of  behavior  are  finding  increasing  application  in  such 
diverse  fields  as  psychology,  sociology,  biology,  and  marketing.   For  example, 
see  [   3  ] »  [  53  ] >  and  [  55  ] .   Stochastic  models  allow  for  the  multitude  of 
factors  which  affect  behavior  by  means  of  response  uncertainty.    Thus  the 

problem  of  describing  and  predicting  behavior  in  a  stochastic  model  is  reduced 

,a 
to  the  problem  of  specifying,  estimating,  and  testing/ probability  law  for  the 

behavior  of  interest.   It  should  be  pointed  out  that  the  specification  of  this 

2 
law  itself  generally  incorporates  important  behavioral  assumptions. 

Behavior  may  often  be  described  by  relatively  simple  stochastic  models,  while 

exceedingly  complex  deterministic  models  would  be  required.   It  should  be 

noted  that  we  are  not  necessarily  assuming  that  respondents  make  their 

responses  via  a  probability  mechanism,  but  that  we  account  for  our  own 

ignorance  of  many  of  the  factors  which  determine  response  via  the  simple,  but 

appealing  mechanism  of  response  uncertainty. 

In  this  discussion  we  shall  consider  the  case  of  brand  choice,  since  this 

behavior  is  of  interest  in  the  empirical  case  we  consider  in  Section  6. 

However,  the  model  and  the  discussion  are  much  more  general,  as  will  be  pointed 

out  in  Section  7, 

Several  factors  complicate  the  development  of  stochastic  models  of  consumer 

3 
brand  choice  behavior.    In  the  first  place,  several  alternative  models  may  prove 

to  be  consistent  with  the  brand  choice  data  of  interest.   Consequently, 

4 


me 


thods  are  needed  for  discriminating  among  competing  models.    Secondly,  the 


1 

See  Massy  [1966]  and  Coleman  [1964ajc 

2 
For  example,  linear  learning  models  incorporate  the  behavioral  assumption  that 

the  entire  history  of  previous  responses  conditions  the  probability  of  the 

response  alternatives  on  any  given  response  occasion. 

o 

For  a  more  complete  discussion  of  these  factors  with  illustrative  examples, 
see  Morrison  [1965]  and  Montgomery  and  Urban  [1967,  Chapter  3]. 
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presence  of  heterogeneity  among  consumers  may  cause  us  to  conclude  erroneously 
that  the  brand  choice  process  is  of  higher  order  than  it  actually  is.    Also, 
non-stationarity  of  the  response  probabilities  may  lead  to  the  same  result. 
Thirdly,  the  stochastic  process  which  gives  rise  to  the  response  probabilities 
may  itself  undergo  change.   Thus  we  have  need  for  models  which  assume 
stationarity  of  the  process  for  only  relatively  short  intervals  of  time. 
Finally,  in  models  of  brand  choice  we  often  aggregate  manv  brands  into  an  "all 
other"  category.   This  has  implications  for  the  class  of  stochastic  operators 
which  may  be  used  if  this  combining  of  classes  is  to  leave  the  basic  structure 
of  the  system  unchanged.   This  latter  problem  would  seem  to  be  relatively  less 
serious  than  the  other  three  at  this  stage  of  model  development. 

What  properties  would  we  like  to  see  in  our  stochastic  models  of  brand 
choice?   These  properties  basically  relate  to  two  aspects  of  our  model  building 
activity:   model  structure  and  statistical  properties.   Some  desirable 
properties  are: 
Model  Properties 

1.  Heterogeneity.   Consumers  should  be  allowed  to  differ  with  respect  to  their 
probabilities  of  choosing  alternative  brands. 

2.  Non-stationarity  of  Response  Probability.   There  should  be  some  mechanism 
for  a  consumer's  brand  choice  probabilities  to  change.   Changes  may  be  the 
result  of  marketing  activities,  product  experience,  or  other  factors 
internal  or  external  to  the  consumer. 

3.  Measures  of  Interest.   The  model  should  furnish  measures  of  interest  to 
the  marketing  manager  and/or  the  management  scientist.   An  example  of  such 
a  measure  might  be  the  expected  market  share  per  brand  in  the  steady-state. 


See  Massy,  Montgomery,  and  Morrison  [1967,  Chapter  2]  and  Morrison  [1966]  for 
some  approaches  to  this  problem. 

For  an  empirical  case  in  point,  see  Massy  [1966]. 

See  Bush,  Mosteller,  and  Thompson  [1954]  or  Bush  and  Mosteller  [1955]. 
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Statistical  Properties 

1.  Data  Based.   The  models  should  relate  to  data  which  either  are  available 
or  else  are  obtainable. 

2.  Test  of  Goodness  of  Fit.   We  need  a  measure  of  the  descriptive  adequacy  of 
a  model  for  a  set  of  data  before  we  should  turn  to  an  interpretation  of 
the  parameters  and  measures  of  interest  from  the  model. 

3.  Estimation  Procedures.   We  need  to  be  able  to  estimate  the  model  from 
data.   The  estimates  should  be  computationally  feasible  having  known 
properties. 

4.  Empiric  Viability.   The  model  should  be  consistent  with  at  least  one  set 
of  brand  choice  data  in  order  to  remain  a  strong  candidate  model.   In  anv 
case,  empiric  viability  is  an  important  aspect  of  any  stochastic  model. 
There  are  three  major  classes  of  stochastic  models  which  have  been  applied 

to  consumer  behavior  —  zero  order,  Markovian,  learning.   Zero  order  models 
were  implicit  in  the  early  work  on  brand  choice  by  Brown  [1952]  and  Cunningham 
[1956].   Measures  of  purchasing  behavior  from  a  zero  order  simulation  model  as 
compared  to  actual  measures  from  consumers  led  Massy  and  Frank  [1964]  to 
conclude  that  brand  switching  behavior  for  coffee  and  tea  is  well  approximated 
by  a  zero  order  process.   Beer  was  found  to  require  a  higher  order  process. 
The  empirical  likelihood  of  a  spurious  inference  of  a  higher  order  switching 
process  due  to  aggregation  of  heterogeneous  consumers  has  been  demonstrated  by 
Frank  [1962]  and  Massy  [1966].   Morrison  [1965]  has  developed  statistical  tests 
for  a  heterogeneous  population  of  zero  order  consumers  whose  response  probabilities 


The  data  based  aspect  of  a  model  is  particularly  important  when  our  interest 
centers  upon  questions  relating  to  the  structure  of  behavior.   For  example, 
our  interest  might  center  upon  the  question  of  whether  consumers  are  more 
nearly  described  by  a  non-stationary  zero  order  process  or  by  a  linear  learning 
model  in  terms  of  their  brand  choice  behavior.   Note,  however,  that  in  some 
cases  a  manager  may  be  willing  to  use  subjective  parameter  estimates  in  a 
stochastic  model  used  for  predictive  purposes^ 
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remain  constant  over  a  few  purchases.   In  his  empirical  work  he  assumed 

stationarity  over  sequences  of  five  trials.   Howard  [1965]  has  proposed  a 

heterogeneous,  non-stationary  zero  order  model  in  which  the  underlying 

parameters  of  the  stochastic  process  which  generates  observable  outcomes  are 

themselves  subject  to  change  at  times  determined  by  yet  another  stochastic 

process.   He  presents  an  example  of  a  model  having  a  Bernoulli  observable 

process,  a  beta  parameter  distribution,  and  a  geometric  distribution  for  the 

time  between  parameter  changes. 

Examples  of  Markov  models  of  consumer  behavior  may  be  found  in  Maffei 

[1960],  Herniter  and  Magee  [1961],  Harary  and  Lipstein  [1962],  and  Styan  and 

Smith  [1964].   Telser  [1962]  used  a  variant  of  the  Markov  brand  switching 

model  in  which  the  transition  probabilities  were  made  functions  of  brand  prices 

in  the  product  market.   Estimates  of  the  price  elasticities  for  branded  goods 

are  available  from  this  approach.   Morrison  [1966]  generalized  previous 

approaches  by  formulating  a  model  in  which  consumers  may  differ  from  each  other 

with  respect  to  their  transition  probabilities.   Lipstein  [1965]  has  developed 

a  non-stationary  Markov  model  relating  advertising  effort  to  attitude  changes 

and  consumer  purchases. 

The  first  application  of  a  learning  model  is  due  to  Kuehn  [1958  and  1962], 

learning 
who  used  a  modified  form  of  the  Bush  and  Mosteller  linear  /    model.   Carman 

[1966]  has  presented  an  empirical  test  of  the  linear  learning  model.   Haines 

[1964]  modified  the  rejection  operator  in  the  linear  learning  model  and  developed 

the  aggregate  market  implications  of  this  model  in  the  period  following  a 

market  innovation.   He  used  the  model  to  develop  summary  measures  of  market 

dynamics  (expected  equilibrium  market  share  and  rate  of  approach  to  equilibrium) 

and  then  related  these  summary  measures  to  marketing  policy  variables  and 
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market  characteristics »   Massy  [1965]  has  provided  estimation  procedures  for 
the  linear  learning  model  when  consumers  are  initially  heterogeneous,  in 
contrast  to  the  initial  trial  homogeneity  assumption  of  previous  applications. 
Demetz  [1962]  used  a  somewhat  different  concept  of  learning  in  an  econometric 
study  of  the  propensity  for  consumers  to  learn  to  ignore  artificial  brand 

o 

distinctions. 

The  model  developed  and  tested  in  this  paper  may  be  described  as  a 
heterogeneous,  non-stationary  zero  order  process.   In  contrast  to  the  Howard 
[1965]  model  in  which  change  occurs  at  discrete  points  of  time,  the  present 
model  allows  changes  in  response  probabilities  to  occur  in  continuous  time. 

II.   MEASURES  FROM  THE  MODEL 

Recall  from  the  previous  section  that  measures  of  interest  to  the 
marketing  manager  and/or  the  management  scientist  constitute  an  important 
property  of  a  stochastic  model.   Before  turning  to  the  development  and  testing 
of  the  model,  it  seems  useful  to  review  briefly  certain  of  the  measures  of 
market  dynamics  which  may  be  obtained  from  the  present  model. 

The  question  of  the  nature  and  extent  of  brand  loyalty  has  long  been  of 
concern  in  marketing  circles.   An  appealing  measure  of  a  consumer's  loyalty 
toward  a  brand  is  his  probability  of  purchasing  that  brand  on  some  purchase 
occasion.   In  the  present  model  we  are  able  to  estimate  the  cross-sectional 
distribution  of  response  probability  across  the  population  of  consumers  on  any 

Q 

given  trial.   Thus  we  may  estimate  the  distribution  of  brand  loyalty  toward  a 


For  a  more  detailed  discussion  of  these  applications  and  the  research  context 
of  stochastic  models  of  consumer  behavior,  see  Montgomery  and  Urban  [1967, 
Chapter  3], 

9 
The  present  paper  concentrates  upon  an  empirical  test  of  the  model  s  viability. 

For  the  procedure  for  estimating  the  cross-sectional  distribution,  see 

Montgomery  [1966a]. 
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brand  on  successive  trials. 

In  this  model  consumers  are  allowed  to  change  their  response  (brand  of 
purchase)  probabilities  between  trials  (purchase  occasions) .   In  the  present 
model  we  obtain  parameter  estimates  which  represent  the  propensity  for  a 
consumer  to  increase  or  decrease  his  response  probability  between  trials. 

Further  measures  include  the  expected  equilibrium  choice  (purchase)  share 
for  the  brand  of  interest  as  well  as  the  rate  at  which  the  market  will  approach 
equilibrium  from  any  disequilibrium  position.   The  total  equilibrium  distribu- 
tion of  brand  loyalty  is  also  estimable.    The  minimum  chi-square  estimation 
procedure  discussed  in  Section  6  also  yields  a  measure  of  the  "goodness  of 
fit"  or  descriptive  adequacy  of  the  model.   Note  that  an  adequate  "fit"  is  a 
prerequisite  to  use  of  the  model's  measures  in  any  empirical  situation. 

III.   THE  GENERAL  MODEL 

Attention  will  be  confined  to  models  of  binary  response  —  i.e.,  there 
are  two  mutually  exclusive  and  collectively  exhaustive  response  alternatives. 
In  the  case  of  consumer  brand  choice  this  means  that  the  brands  in  the  market 
will  be  classified  into  two  brand  alternatives,  A  and  B.   Brand  A  might 
represent  our  brand  while  Brand  B  might  represent  an  aggregate  competitive 
brand.   Alternatively,  Brand  A  might  represent  a  family's  favorite  brand  while 
Brand  B  might  represent  any  non-favorite  brand. 

At  this  point,  we  define  certain  terms  which  will  be  used  in  this 
discussion. 

Response  Occasion.   A  response  occasion  is  the  event  that  the  consumer 
makes  a  purchase  of  either  Brand  A  or  Brand  B. 


An  iterative  technique  for  obtaining  this  estimate  will  be  available  in  a 
future  working  paper. 


Page  7. 

Response  Probability.   This  Is  the  probability  that  Brand  A  will  be 
purchased  on  any  given  purchase  or  response  occasion.   It  is  the  probability 
that  Brand  A  is  chosen  conditional  on  there  being  a  purchase  of  one  of  the 
two  alternative  brands. 

State-Space  of  the  Latent  Markov  Process-   The  postulated  latent  Markov 
process  operates  on  the  response  probabilities.   The  state-space  of  this 
latent  Markov  process  is  the  set  of  all  possible  values  of  the  response 
probability-   This  state  space  may  be  finite  or  continuous,  depending  upon  the 
particular  model- 

Response  Elements,   The  response  elements  postulated  in  the  axioms  below 
are  hypothetical  constructs  which  are  useful  as  a  framework  for  the  derivation 

of  the  dynamic  properties  of  latent  Markov  models.   These  response  elements  are 

11 
conceptually  similar  to  the  stimulus  elements  of  stimulus  sampling  theory 

and  are  used  in  an  analogous  manner.   As  in  the  stimulus  sampling  models  from 

mathematical  learning  theory,  there  is  no  need  in  the  present  case  to  achieve 

an  isomorphism  between  these  hypothetical  response  elements  and  any  overt  real 

world  phenomenon.   They  simply  represent  a  useful  conceptual  construct  in  the 

model  development- 

Axioms  of  the  General  Model 

The  development  of  the  general  model  as  well  as  the  specification  of  it 

given  in  the  following  two  sections  will  be  facilitated  by  the  presentation  of 

a  sufficient  set  of  axioms  for  the  derivation  of  the  general  model.   The  axioms 

are  classified  into  three  subclasses:   specification,  response,  and  Markov 

process. 


11 

See,  for  example,  Atkinson,  Bower,  and  Crothers  [1965] 
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Specification  Axioms 

(51)  All  individuals  in  the  population  behave  according  to  the  same 
continuous  time  Markov  process. 

(52)  On  any  response  occasion  there  are  two  mutually  exclusive  and 
collectively  exhaustive  responses,  A  and  B. 

(53)  Within  each  individual  there  are  N  (possibly  infinitely  many) 
hypothetical  response  elements. 

(54)  At  any  given  time  t,  each  of  the  N  response  elements  within  an 
individual  is  uniquely  associated  with  either  response  A  or  B. 

It  should  be  remarked  that  by  Axiom  SI  we  have  assumed  that  all  individuals 

in  the  population  are  identical  with  respect  to  the  process  which  determines  the 

dynamics  of  their  response  probabilities.   This  does  not,  however,  say  that 

individuals  have  the  same  response  probability  at  any  given  time,  t.   In  fact, 

in  the  development  of  the  two  models  we  shall  explicitly  postulate  that  at  t=0 

the  individuals  in  the  population  are  distributed  with  respect  to  their  initial 

probability  of  purchasing  Brand  A.   That  is,  they  are  heterogeneous  with 

respect  to  their  response  probabilities. 

Response  Axioms 

(Rl)   If  at  time  t  an  individual  has  i  of  his  elements  associated  with 
response  A  and  if  he  makes  a  response  at  time  t,  then  his 
probability  of  making  response  A  at  time  t  will  be 

P{A  at  time  t  j  i  elements  associated  with  A}  =  — 

where  N  is  the  total  number  of  response  elements. 

(R2)   Pseudo-Bernoulli  Trials  Assumption.   The  history  of  the  process  does 
not  affect  the  probability  of  response  A  at  time  t.   Formally, 

P  [A(t)]  =  P  [A(,t)  |A(t-l)]  =  P  [A(t)l  B(t-l)] 

(R3)   Individuals  in  the  population  respond  independently  of  one  another. 

The  role  of  the  hypothetical  response  elements  is  indicated  by  Axiom  Rl. 
This  axiom  states  that  an  individual's  probability  of  making  response  A  at  any 
time  t  is  just  equal  to  the  proportion  of  his  response  elements  which  are 
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associated  with  response  A  at  time  t.   Axiom  R2  is  termed  the  pseudo-Bernoulli 
trials  assumption  because  it  captures  the  essence  of  Bernoulli  trials  in  that 
there  are  two  mutually  exclusive  and  collectively  exhaustive  response 
possibilities  and  the  probability  of  the  occurrence  of  these  alternatives  is 
independent  of  the  history  of  the  process =   It  is  at  this  point  that  the  current 
formulation  stands  in  direct  contrast  to  learning  models,  which  postulate  that 
the  particular  sampling  function  observed  is  the  only  factor  which  causes  the   </ 
probability  of  choosing  Brand  A  on  future  trials  to  change^   The  adjective 
"pseudo"  is  used  in  recognition  of  the  fact  that  "true"  Bernoulli  trials  require 
that  the  process  generating  the  trials  remains  stationary.   As  will  become 
evident  from  the  Markov  process  axioms  discussed  below,  the  response 
probability  for  any  given  individual  is  subject  to  change  over  time. 

Axiom  R3  will  prove  to  be  of  considerable  importance  when  we  consider 
problems  of  aggregation  and  estimation.   Perhaps  it  would  be  well  to  state  at 
this  point  that  in  applications  of  these  models  to  consumer  panel  data  this 
axiom  is  fulfilled  by  the  nature  of  the  data  gathering  process.   Panel  house- 
holds are  sufficiently  dispersed  geographically  to  assure  us  of  nearly  zero 
interpersonal  interaction. 

In  summary,  Axiom  R2  assumes  that  for  any  given  individual  responses  will 
be  independent  over  time.   Axiom  R3  assures  us  that  the  responses  are  independent 
cross-sectionally  between  individuals.   Hence  all  responses  of  all  individuals 
are  postulated  by  the  model  to  be  independent. 

Markov  Process  Axioms 

The  state  space  will  be  an  individual's  response  probability.   Since  each 
individual  in  the  population  is  assumed  to  have  the  same  number  of  response 
elements,  N,  an  equivalent  state  space  would  be  the  number  of  response  elements 
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associated  with  response  A.   We  shall  denote  the  states  of  this  latter  state 

space  by  i.   In  addition,  we  shall  use  the  symbol  0(At)  to  denote  terms  of  an 

order  which  tend  to  zero  faster  than  At. 

(Ml)   If  at  time  t  an  individual  is  in  state  i(i=0,l, . . . ,N-1) ,  the 
probability  of  the  transition  i->i+l  in  the  interval  (t,t+At) 

is  A. At  +  0(At). 

1 

(M2)   If  at  any  time  t  an  individual  is  in  state  i(i=l, 2, . . . ,N) ,  the 

probability  of  the  transition  i->i-l  in  the  interval  (t,  t+At)  is 
y,At  +  0(At). 

(M3)   The  probability  of  a  transition  to  other  than  a  neighboring  state 
is  0(At),   Formally,  the  probability  of  the  transition  i^j    for 
|i-j|  >1  is  0(At).   Hence 

lim    '  ■■'  ' =  0 

r.  At 

(M4)   The  process  is  stationary.   That  is  X.  and  p,  are  independent  of 
timco 

Note  that  by  Axioms  Ml  -  M4  we  have  postulated  a  general  birth-death 
process  on  the  response  elements  in  terms  of  their  association  with  either 
response  A  or  B,   For  N  (the  total  number  of  response  elements)  finite,  the 
Markov  axioms  specify  a  birth-death  process  on  a  population  which  has  an 
upper  bound,  N,  and  a  potential  birth  pool  constrainted  to  N-i   if  we 
consider  an  element  which  changes  its  association  from  B  to  A  as  being  "born". 

The  above  system  of  axioms  specifies  a  latent  Markov  process  which  operates 
in  continuous  time  upon  the  response  elements  within  any  individual.   It  is 
Markovian  with  respect  to  the  state  space  of  response  probabilities.   The 
Markov  process  is  latent  in  that  it  operates  on  the  unobservable  state  space  of 
response  probabilities. 

From  the  Markov  process  axioms  Ml  -  M4  we  are  able  to  develop  a  system  of 
differential  equations  on  the  probability  that  an  individual  will  occupy  a 
given  state  at  time  t.   Let  p.(t)  denote  the  probability  that  at  time  t  an 
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individual  has  1  of  his  response  elements  associated  with  response  A,  a 

purchase  of  Brand  A,   'The  system  of  equations  is  given  by: 

(3-1)      dp  (t) 

-j~   =  -A^p^(t)  +  u^p^(t)  for  i  =  0 

-^  =  -(\  +  yi)Pi(t)  +  A._^p^_^(t)  +  yi^.iPi+i(t)   for  0  <  i  <  N 

^  =  -^nPn(^>  ^  ^N-l^-l^'^)  ^°^  '   =  ^ 

In  the  models  developed  in  the  next  two  sections  we  further  specify      \ 

\ 
I 


X,  and  u.c   We  then  examine  the  steady-state  distribution  of  response 


probability c 


IV.   THE  INDEPENDENT  ELEMENTS  MODEL 


12 
In  the  Independent  Elements  Model   (lEM)  as  well  as  in  the  remainder  of 

this  paper,  we  denote  the  response  probability  of  an  individual  at  time  t  by 

X(t)  =  i/N,  where  i  is  the  number  of  the  N  response  elements  associated  with 

response  A  at  time  t  and  time  indexes  response  occasions.   Strictly  speaking  we 

should  have  i(t),  but  it  will  be  notationally  convenient  to  suppress  direct 

expresion  of  the  dependence  of  i  upon  to 

We  must  first  further  specify  the  properties  of  the  latent  Markov  model. 

In  particular,  we  shall  assume  that: 

(i)    each  element  associated  with  response  B  has  transition  intensity  a 
toward  becoming  associated  with  response  A. 

(ii)   each  element  associated  with  response  A  has  transition  intensity  6 
toward  becoming  associated  with  response  B. 

(iii)  the  N  response  elements  within  an  individual  behave  independently. 


1  9 

Coleman  has  proposed  this  model  type  in  studies  of  consumer  behavior  [1963] 
and  in  studies  of  voting  behavior  in  small  union  groups  [1964b,  Chapter  11]. 
In  [1964a]  he  develops  a  model  which  the  present  work  generalizes  in  several 
directions.   The  present  work  is  indebted  to  his  results. 
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These  additional  assumptions  enable  us  to  specify  the  A.  and  y.  of  the 
general  model.   In  particular, 
(4-1)      X.    =    (N-l)a 
and 
(4-2)      y.  =  iB 

Equation  (4-1)  represents  the  fact  that  for  h  an  interval  of  time  so  small 
that  at  most  one  element  may  change  its  state  there  are  N-i  elements  associated 
with  response  B  which  could  shift  their  association  to  response  A.   Hence,  in 
order  for  the  state  of  an  individual  to  change  from  i  to  i+1,  one  of  his  N-i 
elements  associated  with  response  B  would  have  to  change  its  association.   This 
can  happen  in  N-i  independent  ways  during  the  interval  h.   Similarly,  we  have 
(4-2).   Substituting  (4-1)  and  (4-2)  into  (3-1)  and  solving  for  the  steady- 
state  distribution  of  i,  we  find 


13  nM'^^ 


Ml  •         N-i 

("•-^^      Pi  =  i!(N-i)!    W^   ^^  -  a+6^ 

Note  that  we  have  dropped  the  index  t  since  we  are  considering  the  steady- 
state  distribution  of  i.   Thus  the  steady-state  or  equilibrium  distribution  of 

14 
response  probability  is  just  the  binomial  distribution   with  mean 

(4-4)       E[X  =  i/N]  =  ^ 

and  variance 

(4-5)      Var  [X  =  i/N]  =  [1/N] [a/ (a+3) ] [1  -  a/(a+B)] 


Equation  (4-3)  is  derived  using  the  equations: 
dp. (t)/dt  =0         i  =  0,  ...,  N 

N 

E    P   =  1 

i=0   i 

Note  that  this  model  is  equivalent  to  Feller's  power  supply  problem.   See 
Feller  [1957,  pp,  420-421  and  436]. 
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It  remains  to  examine  the  behavior  of  the  lEM  as  a  function  of  N,  the 
number  of  response  elements  within  an  individual.   There  are  two  extreme 
cases  of  interest:   N  very  small,  N  very  large. 

Case  1.   N  very  small. 

In  this  case  it  is  clear  that  an  individual's  response  probability  may 
only  take  on  a  few,  widely  spaced  values.   If  N=l,  he  may  only  have  the 
response  probability  values  X(t)  =  0  or  X(t)  =  1.   If  N=2,  he  may  only  have 
the  values  X(t)  =  0,  X(t)  =  — ,  and  X(t)  =1.    A  model  which  restricts  the 
response  probability  of  an  individual  to  such  discrete  values  is  limited  in 
its  appeal  as  a  model  of  choice  behavior.   It  is  more  appealing  to  have  a 
model  which  allows  a  response  probability  continuum.   Hence  we  now  turn  to  the 
case  where  the  number  of  response  elements  increases  without  limit. 

Case  2.   N  increases  without  limit. 

In  the  limit,  the  values    an  individual's  response  probability  may  take 
on  approaches  a  continuum.   This,  of  course,  is  a  more  appealing  situation  than 
that  of  Case  1.   But  the  lEM  breaks  down  in  another  sense  in  this  case.   If 
we  let  N  increase  without  limit  in  (4-5)  ,  we  see  that  in  the  steady-state  an 
individual  described  by  the  lEM  will  have  a  response  probability  exactly  equal 
to  a/(a+6)  since  the  variance  goes  to  zero.   Since  by  Axiota  SI  we  are  assuming 
that  the  same  process  in  terms  of  a  and  g  holds  for  all  the  independent 
respondents  in  the  population,  this  further  implies  that  the  entire  population 
of  respondents  converges  to  the  response  probability  a/(a+3)  in  the  steady-state 
regardless  of  their  starting  state.   Hence  the  model  implies  homogeneity  of 
response  probability  in  the  steady-state.   Such  a  result  is  no  more  appealing 


-'-^An  individual's  response  probability  may  only  take  on  N  +  1  discrete  values 
if  he  has  N  response  elements. 
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than  the  assumption  that  all  individuals  start  out  with  the  same  response 
probability,  as  has  generally  been  the  case  in  applications  of  linear  learning 
models. 

Fortunately,  it  is  possible  to  rectify  the  problems  noted  above  by 
postulating  a  process  in  which  there  are  cohesive  forces  between  the  elements. 
Such  a.  model  is  developed  in  Section  V. 

Although  we  already  know  that  the  lEM  is  a  rather  unsatisfactory  model  of 

consumer  brand  choice,  nevertheless  it  is  useful  to  examine  the  dynamics  of  the 

17 
degeneracy  of  this  model  for  the  infinite  element  case.   By  a  derivation 

similar  to  that  given  in  the  Appendix  for  the  Cohesive  Elements  Model,  we 
could  show  that  the  mean  and  variance  of  X(t)  conditional  on  the  initial 
response  probability  X(0)  are 

(4-6)      E  [X(t)  I  X(0)]  =  X(0)exp  {-(ct  +  B)t}  +  [a/(a+e)][l  -  exp  {-(a+B)t}] 
and 

(4-7)      Var[X(t)  |  X(0) ]  =  Var[X(0)  |  X(0)]  exp  {-2(a+6)t}  =  0 
respectively.   From  (4-6)  and  (4-7)  we  see  that  the  infinite  element  specifica- 
tion of  the  lEM  implies  that  the  response  probability  of  an  individual  progresses 
deterministically  from  X(0)  to  a/ (a+B)  regardless  of  the  initial  value,  X(0), 
Thus,  the  degeneracy  of  the  lEM  lies  in  the  fact  that  it  implies  that  the  "\ 
respondents,  although  perhaps  initially  heterogeneous  with  respect  to  their 
individual  response  probabilities,  deterministically  converge  to  the  same 
response  probability. 


For  an  exception,  see  Massy  [1965] 


17 See  Montgomery  [1966b,  Chapter  2]  for  the  derivation. 
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V.   THE  COHESIVE  ELEMENTS  MODEL 

In  the  previous  section  the  lEM  was  shown  to  degenerate  when  the  number  of 
response  elements  increases  without  limit.   In  the  present  section  this  situa- 
tion is  rectified  by  the  development  of  a  birth-death  process  on  the  response 
elements  in  which  the  elements  are  ssumed  to  have  a  cohesive  property. 
Formulation  and  Steady-State  Distribution 

Once  again,  we  must  further  specify  the  transition  intensities  of  the 

latent  Markov  process.   For  the  Cohesive  Elements  Model  (CEM)  we  retain  (i) 

and  (ii)  of  the  lEM  specification  from  the  previous  section.   However,  we  now 

replace  (iii)  of  that  section  by 

(iii')      the  transition  intensity  of  each  element  is  increased  by  an  amount  ^ 
Y  for  each  element  associated  with  the  opposite  response. 

Assumption  (iii')  might  be  thought  of  as  an  assumption  of  cohesion  or 
attraction  between  the  response  elements. 

Using  (i) ,  (ii) ,  and  (iii'),  we  are  able  to  identify  A.  and  y.,  the  birth 

and  death  parameters  in  the  General  Model.   These  transition  intensities  are 

(5-1)       A.  =  /Single  element  transition  intensity]    /Number  of  response  \ 
^from  B  to  A  ^         \  elements  in  state  By 


=  (a+  iY) (N-i) 


and 


(5-2)      u^  =  [6+(N-i)Y]i 

Substituting  (5-1)  and  (5-2)  into  (3-1),  the  differential  equations  of  the 
General  Model,  we  have 

(5-3)      d   p^(t)  =  _N  a  p  (t)  +  [g  +  (N-l)Y]p,(t)  for  i  =  0 

dt  ° 

'^  ^±^^^    -  -  {(N-i)(ct+iY)  +  i[6+  (N-Dy]}  p.(t) 
dt  ^ 

'  +  (N-i+l)[a+(i-l)Y]p._^(t) 

+  (i+l)  [g  +(N-i-l)Y  jp.^-,  (t) for  0  <  i  <  N 
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'^  Pn^"^  =  -  NB  p^(t)  +  [a+  (N-l)Y]p^_^(t)  for  i  =  N 

dt 

Coleman  [1964b,  pg«  345]  reports  this  steady-state  distribution  as: 

,r   ,s  (A   r[(a/Y)  +  i]  r[N  +  (g/Y)  -  i1  rr(a+g)/Yl 

^^  ^^  Pi   \ij  T[a/y]    r[N  +  (a+B)/T]  r[g/Y] 

where  we  have  once  again  dropped  t   from  p.(t)  since  we  are  now  considering  the 
steady-state  distribution.   Thus,  at  stochastic  equilibrium  the  distribution  of 
i,  the  number  of  elements  in  state  A,  is  given  by  (5-4).   Coleman  terms  this 
distribution  the  "contagious"  binomials 

Recall  from  the  previous  section  that  our  concern  is  with  the  steady- 
state  distribution  of  X  =  i/N,  the  probability  that  the  individual  consumer 
will  purchase  Brand  A  rather  than  Brand  B  on  any  given  purchase  occasionc 
If  we  allow  N,  the  total  number  of  response  elements  within  a  consumer,  to  go 
to  infinity,  the  response  probability  will  become  a  continuous  measure  between 
0  and  1,  thus  yielding  a  more  satisfying  model  of  the  individual  respondent  or 

consumer o 

1  8 
We  seek  the  steady-state  distribution  of  X,  f(X).   Using  a  result   on 

the  limiting  behavior  of  gamma  functions  when  a  term  in  the  gamma  argument 

19 
increases  without  limit,  we  could  show   that 

(5  5)      tcx;  -  ^^^/.^^  ^.^g/^^ 

2 
which  is  the  beta  distribution  having  mean,a/ (a+6)  ,  and  variance  a&y/(a+&)    (a+g-'rY) 

It  is  clear  from  (5-5)  that  the  CEM  does  not  degenerate  into  a 


-'-°See  Montgomery  [1966b] 
-'-^See  Montgomery  [1966b] 
20see  Raiffa  and  Schlaifer  [1961,  p,  216] 
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single  probability  of  purchasing  Brand  A  in  the  steady-state  as  the  number  of 

response  elements  increases  without  limit. 

Mean  Value  Function  and  the  Variance  of  the  Change  Process 

The  next  task  is  to  develop  relations  which  will  describe  the  evolution  of 
X(t)  through  time.   Recall  that  X(t)  =  i/N  is  the  probability  that  the 
individual  will  purchase  Brand  A  at  time  t,  where  time  indexes  purchase 
occasions.   The  procedure  will  be  to  develop  equations  for  E[X(t)]  and  Var[X(t)]. 
It  will  be  seen  that  the  variance  of  this  process  at  any  time  t  >  0  does  not 

degenerate  as  the  number  of  response  elements  increases  without  limit. 

21 
Consider  E[X(t)],  the  mean  value  function   of  the  stochastic  process 

which  describes  the  evolution  of  X(t)  through  time.   In  the  Appendix  it  is  shown 

that 

(5-6)      dEl^__  ^^^3^  ^^^(^^^ 

which  has  the  solution 

(5-7)      E[X(t)]  =  E[X(0)]  exp  [-(a+e)t]  +  {a/(a+e)}  {1  -  exp [-(a+6) t ] } 

where  E[X(0)]  is  the  initial  condition. 

Suppose  for  some  individual  described  by  the  CEM  that  we  know  X(0) ,  his 
probability  of  purchasing  Brand  A  at  time  0  —  i.e.,  at  the  first  purchase 
under  consideration.   Then  E[X(0)]  =  X(0)  and  we  may  write  (5-7)  as: 
(5-8)      E[X(t)  I  X(0)]  =  X(0)  exp  [-(a+6)t]  +  {a/(a+B)}  {1  -  exp  [-(a+6)t]} 
This  result  is  useful  when  we  aggregate  individuals  to  form  estimating 
equations.   Note  that  the  mean  value  function  of  the  CEM  (5-8)  is  identical  to 
that  of  the  lEM  (4-6) 


^^See  Parzen  [1962,  Chapter  3]  for  a  discussion  of  the  mean  value  function  of 
a  stochastic  process. 
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Using  a  procedure  similar  to  that  used  in  the  Appendix  to  derive  (5-6)  and 
(5-7) ,  we  could  show  that 
(5-9)      Var[X(t) j=  a By 


(a+6)^(a+B+Y) 


^fj:fmb+2y)  ^^/("-^^)  -  ^[^(°)l^  -P  [-(a+3)t]} 

+  {E[X2(0)]  -  a(a+Y) -  2(a+Y)  { (ot+B)E[X(0]  -g}} 

(a+B)  (a+B+y)    (a+6)  (a+B+2y) 

exp[-2(a+B+Y)t]  -  {e2[X(0)]  -  [2a/ (a+B)]  E[X(0)]  + 

[a/[a+B)]2}  exp[-2(a+B)t]. 
See  Montgomery  [1966a,  Chapter  3]  for  the  derivation  of  (5-9).   We  see  from 
(5-9)  that  the  evolution  of  an  individual's  probability  of  purchasing  Brand  A 
is  truly  stochastic.   In  the  steady-state  the  variance  is 
(5-10)     Var[X]  =  a By 


(a+6) 2  (a+B+y) 
The  Diffusion  Limit  of  the  Process 

The  stochastic  process  {X(t) ;  t  >  0}  may  be  of  two  types:   discontinuous 
or  diffusion.   Discontinuous  processes  are  characterized  by  the  fact  that  the 
probability  of  a  change  in  the  interval  (t,  t+h)  where  h  is  very  small  is  of 
the  order  of  magnitude  of  h.   However,  in  a  discontinuous  process,  when  a  change 
does  occur,  it  is  finite  in  magnitude.   In  contrast  to  processes  of  the 
discontinuous  type  are  diffusion  processes  in  which  X(t)  changes  continuously. 
In  diffusion  processes,  no  matter  how  small  the  interval  (t,t+h),  X(t)  will 
undergo  some  change.   This  change  is  practically  certain  to  be  small  for  small 
intervals.   In  formal  terms  this  says  that 
P{|x(t+h)-X(t) |>E}<0(h). 

For  a  diffusion  process  the  infinitesimal  mean  displacement  a(X)  and  the 
infinitesimal  variance  2b (X)  play  a  fundamental  role.   These  quantities  are 
defined  as : 
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(5-11)      lim  Erx(t+h)-X(t)|x(t)]  ^  ^(^^ 

(5-12)      lin.  Var[X(t->-h)-X(t)|x(t)]  ^^^^^^ 

Using  the  expectation  procedure  illustrated  in  the  Appendix,  we  could  show  that 

the  infinitesimal  mean  and  variance  of  the  CEM  are  given  by 

(5-13)     a(X)  =  a-(a+e)X(t) 

and 

(5-14)      2b(X)  =  2Y[X(t)-x2(t)], 

respectively.   See  Montgomery  [1966a,  Chapter  3]  for  the  derivation. 
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Let  f(t,X)  denote  the  probability  density  function  of  X(t) .   Kolmogorov 

has  shown  that  f(t,X)  must  satisfy  the  Fokker-Planck  diffusion  equation 
(5-15)      9f(t,X)  ^  82(b(X)f(t,X)}  -9{a(X)f(t,X)} 

8t       ax^  ax 

where  a(X)  and  b(X)  are  the  quantities  defined  in  (5-11)  and  (5-12). 

The  CEM  may  be  represented  by  a  diffusion  process  as  the  number  of  response 

elements,  N,  increases  without  limit.   In  this  case  an  individual  has  an  infinite 

number  of  response  elements  undergoing  the  change  postulated  in  the  assumed 

process.   Recall  that  it  was  noted  that  the  CEM  has  a  steady-state  distribution 

of  response  probabilities  which  follows  a  beta  law.   This  beta  distribution  must 

satisfy  the  Fokker-Plank  diffusion  equation  (5-15)  in  the  steady-state,  that 

is  when  '^^[^'^^    =   o.   Using  (5-13)  and  (5-14)  it  is  readily  shown  that  (5-5) 
c3 1 

satisfies  (5-15).   In  view  of  the  above  discussion  we  might  refer  to  the 
infinite  element  form  of  the  CEM  as  a  probability  diffusion  model  since  a 
respondents'  probability  of  making  response  A  changes  in  time  in  accordance 
with  the  Fokker-Plank  diffusion  equation. 


^^See  Feller  [1950]  or  Prabhu  [1965] 
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Aggregation 

Thus  far,  consideration  has  been  given  to  the  process  at  the  level  of  the 
individual  respondent.   In  order  to  obtain  estimating  equations  for  the  model, 
it  is  necessary  to  aggregate  responses  cross-sectionally  in  time  over  individual 
respondents  who  behave  according  to  the  same  process  —  i.e.,  have  the  same 
a,  3,  and  y.   Perhaps  it  is  well  to  note  just  what  we  mean  when  we  say  that  all 
individuals  are  assumed  to  behave  in  accordance  with  the  same  process.   What  we 
mean  by  this  is  that  the  parameters  of  the  change  process  —  a,  B,  and  y   — 
are  the  same  for  each  individual  respondent  in  the  population.   This  is  not 
at  all  the  same  thing  as  saying   that  all  individuals  have  the  same  response 
probability.   In  fact,  the  CEM  allows  for  heterogeneity  among  respondents  with 
respect  to  their  response  probabilities,  X(t) ,  even  after  the  process  has 

reached  statistical  equilibrium  or  the  steady-state.   This  point  is  clear  from 

the 
the  development  in  the  section  on/mean  value  function. 

In  the  remainder  of  this  section  we  shall  utilize  the  following  additional 

notation  : 

M  =  number  of  individual  respondents 

N  =  the  number  of  response  elements  within  individual  k  which 

are  associated  with  response  A,  k=l,2,...,M. 

\ 
X^(t)=T7~    =  the  probability  that  individual  k  having  N,  elements  out  of 

N  associated  with  response  A  at  time  t  will  make  response  A 

at  time  t. 

f[X(t)]     =  the  probability  density  function  of  the  population  of 

individuals  with  respect  to  their  response  probability, 
X(t),  at  time  t.   That  is,  f[X(t)]  is  just  the  distribu- 
tion of  the  response  probability  in  the  population. 

The  results  of  this  section  are  free  of  any  assumption  as  to  the  form  of 

f[X(t)]. 

In  the  CEM  the  data  form  a  sequence  of  O's  and  I's  where  a  1  denotes  the 


Page  21. 

occurrence  of  response  p^      In  a  marketing  context  such  a  sequence  might  represent 
the  sequence  of  brands  purchased.   Response  A  might  be  the  purchase  of  the  brand 
of  primary  concern  in  the  study  while  B  would  be  the  purchase  of  any  other  brand. 

From  the  data  we  may  tabulate  the  proportion  of  individuals  who  made 
response  1  at  time  t  and  the  proportion  of  individuals  who,  having  made  response 
1  at  time  0,  again  make  response  1  at  time  t.   These  empirical  proportions  will 
be  denoted  by  Q(t)  and  Q(0,t),  respectively.   In  Lemma  1  and  Lemma  2  below  it  is 
shown  that  Q(t)  and  Q(0,t)  are  unbiased  estimates  of  P(t)  and  P(0,t)  which  are 
the  theoretical  expected  response  proportions.   The  objective  at  this  point  is 
to  see  whether  or  not  it  might  be  possible  to  relate  the  observed  proportions  to 
the  latent  Markov  process  in  such  a  way  as  to  obtain  estimates  of  the 
parameters  of  the  CEM.   The  relations  established  in  the  two  lemas  proved 
below  will  provide  the  foundation  for  the  derivation  of  estimating  equations. 

Lemma  1.   If  P(t)  denotes  the  expected  proportion  of  the  M  respondents 

making  response  A  at  time  t,  then 

1 
(5-16)     P(t)  =  E[X(t)]  =  /  E[X(t)|x(0)]  f[X(0)]  dX(0) 

0 
and  the  observed  proportion  Q(t)  is  an  unbiased  estimate  of  P(t)  =  E[X(t)]. 

23 
Proof.     From  probability  theory  we  know  that 

1 

(5-17)      E[X(t)]  =  /  E[X(t)lx(0)]f[X(0)]  dX(0) 

0 

which  is  just  (5-16).   We  now  show  that 

(5-18)     E[Q(t)]  =  P(t)  =  E[X(t)] 

which  along  with  (5-17)  will  establish  Lemma  1.   Consider  any  individual 

respondent  k.  At  time  t,  he  has  probability  X^(t)  of  making  response  A. 


^^See  Parzen  [1960,  page  384] 
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Now  let 


r  (t)  =  0  if  individual  k  purchases  Brand  B  at  time  t 

K. 

=  1  if  individual  k  purchases  Brand  A  at  time  t 
It  is  easily  seen  that  E  [r  ]  =  X^(t).   We  now  aggregate  across  all  M  of  the 
individuals  in  the  population.   In  this  case 


M 

k=i    ^ 


M  M 

=   E    E[r  ]  =  E    \(t)    =  M  E[X(t)] 
k=l     ^  k=l   " 


is  the  expected  number  who  make  response  A  at  time  t.   The  expected  proportion 
clearly  is  just 


E[Q(t)]  =  E 


M 

k°l 

M 


=  ^ 


M 

E 

k=l 
u 


1 


=  E[X(t)]  =  P(t) 


The  latter  equality  follows  by  virtue  of  the  definition  of  P(t)  as  the  expected 

proportion  making  response  A  at  time  t  —  i.e.,  E[Q(t)].   This  establishes 

(5-18)  which  along  with  (5-17)  completes  the  proof  of  the  lemma. 

Lemma  2 .   If  P(0,t)  denotes  the  expected  proportion  of  the  M  respondents 

making  response  A  at  time  0  and  A  again  at  time  t,  then 
1 
(5-19)     P(0,t)  =  E[X(0,t)]  =  /  E[X(t)|x(0)]X(0)  f[X(0)]  dX(0) 

0 

and  the  observed  proportion  Q(0,t)  is  an  unbiased  estimate  of  P(0,t)  =  E[X(0,t)] 

24 
Proof.   Using  certain  results  from  probability  theory   and  the  fact  that 

X(0,t)  =  X(0)X(t)  by  the  Pseudo-Bernoulli  Trials  Axiom  R2,  we  have 


24 


See  Parzen  [1960,  page  384] 
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1 
(5-20)      E[X(0,t)]  =  E[X(0)X(t)]  =  /  E[X(0)X(t) |x(0) ] f [X(0) ]dX(0) 

0 
1 
=  /  E[X(t) |x(0)]X(0)f [X(0)]dX(0) 
0 

It  remains  to  show  that 

(5-21)     E[Q(0,t)]  =  P(0,t)  =  E[X(0,t)] 

Consider  any  individual  k.   His  probability  of  making  response  A  at  times  0  and 

t  are  X,  (0)  and  X,  (t).   Recall  that  under  the  assumptions  of  the  model, 

successive  responses  of  a  given  individual  are  independent.   That  is,  using  the 

definition  of  r,  (t)  given  in  Lemma  1,  the  independence  of  the  trials  (responses) 

implies  that 

P[r^(t)|rj^(0)]  =  P[rj^(t)]. 

Hence  the  probability  of  the  compound  event  r  (O)i^  (t)  =  1*1  =  1  is  just 
X^(0)X,  (t)  =  X,  (0,t).   The  occurrence  o^  non-occurrence  of  the  compound  event 
r,  (0)r  (t)  =  1  is  a  Bernoulli  process  with  Bernoulli  parameter  X,  (0,t).   Hence 
E[rj^(0)r^(t)  =  l|x^(0,t)]  =  Xj^(0,t). 

Now  for  a  population  of  M  independent  individuals  we  have 
MM  M 

E[E    r,  (0)r,  (t)]  =  Z   E[r,  (0)r  (t)]  =  E   X^(0,t) 
k=l  ^  ^  k=l    ^    "       k=l  ^ 

=  M  E[X(0,t) ] 

The  expected  proportion  clearly  is 

M  M 

E[Q(0,t)]  =  E[^  Z    rj^(0)rj^(t)]  =  ^  E[E    rj^(0)rj^(t)  ] 
k=l  k=l 

1  ^ 
=  ^  E   X^(0,t)  =  E[X^(0,t)]  =  P(0,t). 

k=l 

This  establishes  (5-21)  which  together  with  (5-20)  establishes  Lemma  2. 
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VI.   ESTIMATING  AND  TESTING  THE  GEM 

If  the  CEM  is  ever  to  have  any  practical  relevance  to  the  management 

scientist  to  say  nothing  of  the  marketing  manager,  estimation  procedures  and 
> 

25 
methods  for  testing  the  model  must  be  developed.     This  section  presents  some 

results  that  further  the  goals  of  estimation,  testing,  and  comparison  of  the 

CEM  with  alternative  stochastic  models  of  consumer  brand  choice. 

6. 1  Estimating  Equations 

We  now  develop  estimating  equations  using  Lemmas  1  and  2  and  the  results 

of  Section  ^'2,      Using  (5-8)  in  (5-16)  we  find 

1 
(6-1)      P(t)  =  /  E[X(t)|x(0)]f[X(0)]  dX(0) 

0 

1 
=  /  {X(0)  exp  {-(a+e)t}  +  [a/(a+6)][l  -  exp{-(a+3) t }] 
0 

=  P(0)  exp  {-(a+6)t}  +  [a/(a+e)][l  -  exp  {-(a+B)t}] 

=  P(0)  exp  {-kt}  +  a[l  -  exp  {-kt}] 

where  we  let  a  =  a/(a+6)  and  k  =  a  +  3.   We  see  from  (6-1)  that  "a"  is  the 

equilibrium  expected  choice  share  for  Brand  A  while  "k"  is  the  rate  at  which 

27 
the  model  approaches  the  steady-state.     Similarly,  if  we  use  (5-8)  in  (5-19) , 

we  have 

(6-2)  P(0,t)    =  P(0,0)    exp    {-kt}  +  aP(0) [1  -   exp    {-kt}] 


25 

While  Coleman  [1964a,  pp.  96-7]  recognized  the  need  for  better  estimation 

procedures  and  for  methods  of  testing  the  model,  he  left  the  development  of 

these  methods  and  procedures  to  future  research.   In  most  instances  he 

seemed  content  with  methods  which  would  just  identify  the  parameters.   In  no 

case  did  he  consider  the  properties  of  the  estimates  or  the  adequacy  of  the 

fit  of  the  model  to  empirical  data. 

9  f\ 

See  Massy,  Montgomery,  and  Morrison  [1967,  Chapter  2]  for  a  more  detailed 

discussion  of  the  comparison  of  alternative  models. 

27 

The  parameter  k  is  analogous  to  the  response  constant  of  an  electric  circuit. 
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It  is  necessary  at  this  point  to  consider  what  we  mean  bv  P(0,0).   By 
reasoning  analogous  to  the  proof  of  Lemma  2  we  have  P(0,0)  =  E[X^(0)].   Hence 
it  is  the  second  raw  moment  of  the  initial  distribution  of  response  probability 
across  the  population  of  consumers. 
6.2  Parameter  Bounds 

Before  developing  the  minimum  chi  square  procedure  discussed  in  the  following 
section,  we  consider  inherent  constraints  on  the  parameters.   These  constraints 
will  prove  useful  in  the  numeric  minimization  of  the  chi  square  statistic. 
Recall  from  Axioms  M-1  to  M-4  that  a  and  S  represent  transition  intensities  in 
opposite  directions.   We  suffer  no  loss  of  generality  if  we  require  a  and  6 
to  be  non-negative  so  that 
(6-3)    0  £  a  =  a/(a+g)  j;_  1 
and 

(6-4)    k  =  a  +  6  ^  0 
If  k  =  0,  none  of  the  consumers  undergoes  any  change  in  response  probability 

through  time  and  the  model  implies  a  stationary  heterogeneous  Bernoulli 

28 
process. 

It  remains  to  consider  bounds  on  P(0,0),  the  second  raw  moment  of  the 
initial  distribution  of  response  probability.   The  following  Lemma  is  proved 
in  [Montgomery,  1966a,  pp.  108-110]: 

Lemma  3 . 
(6-5)    p2(0)  £  P(0,0)  £  P(0). 

It  is  interesting  to  note  that  if  P(0,0)  =  p2(0),  then  Var[X(0)]  =  0  and  all 
individuals  in  the  population  have  the  same  initial  probability  of  making 


no 

See  Morrison  [1965b]  for  methods  for  estimating  and  testing  heterogeneous, 
stationary  Bernoulli  models. 
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response  A.   If  P(0,0)  =  P(0),  then  all  respondents  are  clustered  at  the  two 

extreme  initial  response  probabilities,  X(0)  =  0  and  X(0)  =  1  with  the  proportion 

of  the  respondents  having  X(0)  =  1  equal  to  P(0).   For  the  strict  inequalities 

p2(0)  <  P(0,0)  <  P(0) 

the  individual  respondents  are  distributed  across  the  response  probability 

continuum. 

Now  let 

making 
(6-6)    Q(t)  =  sample  proportion^response  A  at  time  t 

(6-7)    Q(0,t)  =  sample  proportion  making  response  A  at  time  0  and  again  at 

time  t 

The  properties  of  Q(t)  as  an  estimate  of  the  population  parameter  P(t)  may  be 
summarized  as  follows : 

1.  Q(t)  is  an  unbiased  estimate  of  P(t), 

2.  Q(t)  converges  in  probability  to  P(t). 

3.  Under  certain  reasonable  conditions,  Q(t)  is  asymptotically  normal. 
Similar  results  hold  for  Q(0,t). 

Item  1  follows  from  Lemma  1.   For  Item  2  see  Cramer  [1946,  pg.  207]. 

Item  3  follows  when  Liapounoff's  conditions  are  satisfied  by  our  heterogeneous 

1  .-        29 
population. 

Cramer  [1946,  pp.  217-8)] has  shown  that  a  sufficient  condition  for  the 
sample  proportion  of  a  heterogeneous  Bernoulli  population  to  be  asymptotically 
normal  is  that  no  member  of  the  population  has  a  probability  of  exactly  0  or 
exactly  1  of  making  response  A.   This  condition,  while  only  sufficient  and  not 
necessary  for  asymptotic  normality  of  Q(t),  would  not  be  restrictive  in  applica- 
tions of  the  CEM  to  human  populations,  particular  consumer  populations. 


29 

See  Cramer  [1946,  pp.  215-16).   There  are  more  general  cases  of  the  central 

limit  theorem.   See  Parzen  [1960,  Chpt.  10]. 
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5„3  Minimum  Chi  Square  Estimation 

Our  objective  in  this  section  is  to  formulate  a  minimum  chi  square  procedure 

30 
for  estimating  and  testing  the  CEM.     The  minimum  chi  square  procedure  provides 

a  test  of  the  model  in  the  "goodness  of  fit"  sense  as  well  as  parameter  estimates 

which  have  desirable  asymptotic  properties  as  the  number  of  respondents  increases. 

In  Section  5  we  have  seen  that  the  CEM  yields  expected  response  proportions 

P(t)  and  P(0,t)  as  a  function  of  four  parameters  —  a,  k,  P(0),  and  P(0,0). 

The  corresponding  sample  proportions  have  been  denoted  by  0(0)  and  0(0, t).   A 

measure  of  the  deviations  of  the  sample  proportions  from  the  expected  proportions 

given  by  the  model  is  the  well  known  "goodness  of  fit"  statistic. 


(6-8)     x2=ME    ^'l-h    ^A)]^ 


i=l     E .  (A.) 

where  i  indexes  a  set  of  k  mutually  exclusive  and  collectively  exhaustive 
response  categories,  0.  is  the  observed  number  of  respondents  in  response 
category  i,  and  E.  (A^)  is  the  expected  number  of  respondents  in  response  category 
i.   In  this  case  the  E,(A_)  (i  =  1,  .  .  .  ,  k)  are  generated  by  the  model,  which 
has  the  parameter  vector  A_c   Note  that  the  statistic  given  in  (6-8)  is  a 
weighted  linear  combination  of  the  squared  deviation  of  the  model  expectations 
from  the  data,  where  the  weights  are  the  reciprocals  of  the  expected  cell 
frequencies.   The  statistic  in  (6-8)  is  known  to  have  an  approximate  chi  square 
distribution  under  the  null  hypothesis  that  the  model  generated  the  data. 

If  we  minimize  (6-8)  with  respect  to  the  parameter  vector  A_,  it  is  known 

31 
that   the  statistic  remains  approximately  chi  square  (with  a  loss  of  one  degree 


on 

Note  that  a  minimum  chi  square  procedure  could  be  developed  for  other  cases  of 

the  General  Latent  Markov  Model  —  that  is,  for  alternative  specifications  of 

X .  and  u . . 
1      1 

^"""See  Cramer  [1946,  pp.  424-434  and  p.  506]  and  Neyman  [1949]. 
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of  freedom  for  each  parameter  estimated)  and  the  estimate  of  A^  will  have 
desirable  statistical  properties.   The  estimate  of  A^  will  be  the  best 

asymptotically  normal  estimate  of  A_  among  the  class  of  consistent  and  asymptotically 

32 
normal  estimators. 

For  the  null  hypothesis  that  the  CEM  generated  a  set  of  responses  from  M 

independent  individuals,  the  expected  response  proportions  P(t)  and  P(0,t)  are 

given  by  (6-1)  and  (6-2)  respectively.   A  convenient  format  in  which  to 

discuss  the  expected  and  observed  response  proportions  between  time  0  and  time 

t  is  that  given  in  Table  6-1. 

Table  6-1 

EXPECTED  AND  OBSERVED  RESPONSE  PROPORTIONS  AT  TIMES  0  AND  t 

Expected  Response  Proportions 

Time  t 

A 

Time  0      A 

B 


A 

B 

P(0,t) 

P(0)    -  P(0,t) 

P(0) 

P(t)    -   P(0,t) 

1  -  P(0)    -  P(t)   +  P(0,t) 

1  -  P(0) 

P(t) 

1  -  p(t) 

1.0 

Time  0 


Observed  Response  Proportions 
Time  t 

B 


Q(0,t) 

Q(0)    -  Q(0,t) 

Q(0) 

Q(t)    -  Q(0,t) 

1   -   0(0)    -   0(t)    +  0(0, t) 

1   -  0(0) 

Q(t) 

1  -  o(t) 

1.0 

32 


Strictly  speaking  certain  regularity  conditions  must  be  imposed  upon  the  class 
of  consistent  and  asymptotically  normal  estimators  in  order  to  avoid  certain 
rather  pathological  estimator  candidates.   The  interested  reader  is  referred 
to  Rao  [1963]  and  [1965,  Chapter  5]. 
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We  shall  use  tables  of  the  form  given  in  Table  6-1  as  a  basic  unit  in 
developing  the  chi  square  procedure.   For  the  observed  and  expected  proportions 
in  Table  6-1  and  a  sample  size  of  M  respondents  we  may  form  the  statistic: 
(6-9)       X^(0,t)  =  M[{Q(0,t)  -  P(0,t)}2/P(0,t) 

+  {Q(0)  -  Q(0,t)  -  P(0)  +  P(0,t)}2/{P(0)  -  P(0,t)} 
+  {Q(t)  -  Q(0,t)  -  P(t)  +  P(0,t)}2/{P(t)  -  P(0,t)} 
+  {-Q(0)  -  Q(t)  +  Q(0,t)  +  P(0)  +  P(t)  -  P(0,t)}2/ 
{1  -  P(0)  -  P(t)  +  P(0,t)}] 
where  P(t)  and  P(0,t)  are  given  as  functions  of  the  parameters  P(0),  P(0,0), 
a,  and  k  in  (6-1)  and  (6-2). 

The  question  arises  as  to  how  many  degrees  of  freedom  are  associated 
with  such  a  chi  square  statistic.   In  tabulations  such  as  Table  6-1  we  shall 
use  Q(0)  as  our  estimate  of  P(0),  a  procedure  which  is  justified  below.   In 
essence  then,  we  are  considering  Q(0)  as  fixed.   Since  the  cell  proportions 
must  sum  to  one  and  since  Q(0)  is  fixed,  the  array  of  observed  response 
proportions  given  in  Table  6-1  has  two  degrees  of  freedom.   We  note  that  one 
such  tabulation  will  not  be  sufficient  to  estimate  and  test  the  CEM 
since  there  remain  three  unknown  parameters  (P(0,0),  a,  and  k)  which  must  be 
estimated  and  one  such  table  has  only  two  degrees  of  freedom.   Thus  the  model 
will  be  underdetermined  for  only  one  such  tabulation. 

Recall  that  by  our  axioms  the  observed  responses  will  be  cross- 
sectionally  and  temporally   independent  under  the  model  hypothesis.   Under  this 
hypothesis  tables  for  0  and  t  (t  =  1,  ...,  T)  will  be  independent  and  by  the 
additivity  property  of  independent  chi  squares  we  may  achieve  a  sufficient 
number  of  degrees  of  freedom  to  test  the  model  by  summing  the  chi  squares  and 
degrees  of  freedom  for  several  such  tables.   If  we  have  a  sequence  of  T  +  1 
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responses  from  M  independent  respondents,  we  may  form  the  statistic 

T 
(6-10)      x^  =  2    X^CO.t), 

t=l 

where  x^(0>t)  is  given  by  (6-9).   When  we  estimate  the  parameters  P(0,0),  a, 
and  k  by  minimizing  the  statistic  (6-10)  ,  x^  will  be  approximately  chi  square 
with  2T  -  3  degrees  of  freedom.   As  noted  above  the  estimates  of  P(0,0),  a, 
and  k  will  have  desirable  statistical  properties  and  we  may  test  the  "goodness 
of  fit"  of  the  CEM  to  the  data  by  comparing  (6-10)  to  a  chi  square  statistic 
having  2T  -  3  degrees  of  freedom. 

It  remains  to  justify  our  use  of  Q(0)  as  an  estimate  of  the  parameter 

34 
P(0).   As  M,  the  number  of  independent  respondents,  increases  it  is  known 

that  Q(0)  is  asymptotically  normally  distributed  with  mean  P(0)  and  variance 

[P(0)  {1  -  P(0)}  -  Var  [X(0)]]/M.   For  sufficiently  large  M,  our  observed 

Q(0)  is  a  sample  of  size  one  from  a  normal  distribution  with  the  mean  and 

variance  given  above.   Since  the  sample  mean  from  a  normal  distribution  is  a 

maximum  likelihood  estimator  of  the  population  mean,  Q(0)  will  as3nTiptotically 

be  a  maximum  likelihood  estimator  of  P(0).   The  statistic  given  in  (6-10)  will 

remain  asymptotically  chi  square  when  an  MLE  is  substituted  for  one  of  the 

parameters. 

Examination  of  (6-10)  and  its  associated  equations  quickly  indicates  that 

minimization  of  (6-10)  with  respect  to  P(0,0)  ,  a,  and  k  is  analytically 

intractable.   However,  the  availability  of  high  speed  digital  computers  (parti- 


See  Cramer  [1946,  p.  207]  and  Kendall  and  Stuart  [1958,  pp.  126-7]  as  well  as 
the  references  for  Section  6.2. 

35 

See  Cramer  [1946,  p.  505].   In  fact,  any  efficient  consistent  asymptotically 

normal  estimator  may  be  used  and  the  distribution  will  remain  asymptotically 
chi  square. 
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cularly  time-shared  systems)  renders  numeric  minimization  feasible.   While 
sophisticated  search  heuristics  could  be  used,  the  approach  followed  in  this 
study  was  that  known  as  grid  search.   We  first  define  a  grid  in  the  three 
dimensional  parameter  space  and  then  computer  (6-10)  for  each  parameter  combina- 
tion in  that  grid.   We  may  continue  to  search  successively  finer  grids  until 
we  have  determined  the  minimizing  values  of  P(0,0),  a,  and  k  to  the  desired 
degree  of  accuracy.   The  parameter  bounds  developed  in  Section  6,2  are  parti- 
cularly helpful  in  bounding  our  search  grid.   While  the  grid  search  procedure 

was  found  to  be  somewhat  costly  in  terms  of  both  computer  time  and  calendar 

36 

time   in  batch  processing  operations,  it  has  been  found  to  be  quite  efficient 

on  a  time-sharing  computer  system  where  the  researcher  may  specify  search 
grids  interactively  with  the  program. 
6. 4  Some  Empirical  Results 

The  question  arises  as  to  whether  the  CEM  is  an  empirically  viable 
model.   That  is,  will  it  describe  data?   In  this  section  we  present  the  results 
of  an  initial  empirical  test  of  the  model.   Our  attention  will  focus  upon  the 
"fit"  of  the  model. ■^'' 

The  data  base  for  this  test  is  composed  of  the  dentifrice  purchase 

records  for  January,  1958  to  April,  1963  from  the  M.R.C.A.  National  Consumer 

38 
Panel.     These  data  span  the  August  1,  1960  endorsement  of  Crest  by  the 


Computer  time  dropped  from  a  range  of  five  to  twenty-five  minutes  dox.m  to  an 
average  of  about  two  minutes  while  calendar  time  dropped  from  eight  days  to 
three  hours.   In  addition,  the  estimates  computed  on  time-sharing  were  from 
a  finer  search  grid. 

37 

Interpretation  of  the  parameter  estimates  in  this  and  other  market  situations 

will  be  treated  in  subsequent  papers  and  in  Massy,  Montgomery,  and  Morrison 

[1967]. 

38 

I  am  indebted  to  Dr.  I.  J.  Abrams  of  M.R.C.A.  for  making  these  data  available 

£t  nominal  cost  and  to  the  Stanford  Graduate  School  of  Business  for  supplying 

the  funds  to  obtain  these  data.   For  an  evaluation  of  consumer  panel  data 

see  Boyd  and  Westfall  [I960]. 
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American  Dental  Association,  an  endorsement  which  had  considerable  market 

.   39 
impact. 

There  are  several  reasons  for  choosing  this  market  situation  as  an 
initial  test  of  the  model.   First,  the  A.D.A.  endorsement  of  Crest  allows  us  to 
segment  the  response  data  into  before  and  after  periods.   The  before  period 
represents  a  relatively  normal  market  situation  where  the  dynamics  of  response 
probability  exhibit  no  particularly  strong  trend.   On  the  other  hand,  the 
significant  market  impact  of  the  A.D.A.  endorsement  of  Crest  gives  us  an 
after  period  in  which  the  market  was  in  a  state  of  rapid  transition.   In  the 
latter  case,  the  response  probability  of  many  individuals  was  undergoing  very 
rapid  and  significant  change.   Thus  the  dentifrice  market  in  the  periods  just 
before  and  just  after  the  Crest  endorsement  affords  us  the  opportunity  to  test 
the  model  in  both  a  "normal"  and  a  "transient"  period  for  the  same  product 
class  and  sample  of  households. 

The  second  advantage  is  that  the  brand  of  interest.  Crest,  was  already 
an  established  brand  at  the  time  of  the  A.D.A,  endorsement  in  August  1960. 

Hence,  in  both  the  before  and  after  periods  Crest  was  available  on  a  fully 

41 
distributed  basis.     Thus,  lack  of  availability  should  not  be  a  reason  for  a 

42 
consumer  choosing  some  brand  other  than  Crest. 


39 

The  Crest  market  share  rose  from  about  10%  in  the  pre-endorsement  period  to 

in  excess  of  30%  subsequent  to  the  endorsement. 

40 

This  control  of  respondent  and  product  market  variability  enables  us  to  obtain 

a  better  idea  of  the  empirical  viability  of  the  model  in  these  contrasting 

market  situations. 

41 

The  problem  of  having  the  product  or  brand  available  to  consumers  in  the 

distribution  system  is  particularly  acute  when  one  attempts  to  analyze  new 

product  or  new  brand  introductions. 

42 

To  be  sure,  some  stock-outs  occurred  in  the  rush  to  try  Crest  after  the  A.D.A. 

endorsement.   This,  however,  is  unlikely  to  have  had  any  but  a  minor  influence 

on  brand  choice  and  that  over  a  very  short  interval  of  time. 
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Third,  dentifrice  is  purchased  sufficiently  often  to  provide  sequences 
of  brand  choices  which  are  long  enough  to  provide  ample  degrees  of  freedom  for 
testing  the  model. 

Fourth,  M.R.C.A.'s  National  Consumer  Panel  has  several  thousand  member 
households.   Such  a  large  sample  of  households  will  enable  us  to  segment  the 
sample  into  meaningful  groups  and  still  have  enough  respondents  in  each  group 
to  have  confidence  that  the  asymptotic  properties  of  the  estimates  hold. 

The  population  of  National  Consumer  Panel  households  was  segmented  by 
average  interpurchase  time  in  the  period  prior  to  the  A.D.A.  endorsement.   This 

was  done  to  make  model  time  (which  indexes  purchase  occasions)  correspond  more 
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closely  to  actual  time  between  purchases. 

The  minimum  chi  square  results  for  the  four  before  and  four  after  A.D.A. 

segments  are  reported  in  Table  6- 2.   The  seven  purchases  of  dentifrice 

before  and  before  and    respectively. 

immediately/after  the  endorsement  were  used  in  the/after  periods/  A  purchase 

of  Crest  was  coded  as  a  1,  while  a  purchase  of  any  other  brand  was  coded  0. 

The  p-level  presented  in  this  table  represents  the  probability  of  a  minimum 

chi  square  as  large  or  larger  than  the  one  which  was  observed  given  that  the 

model  generated  the  data.   We  see  from  the  table  that  in  the  worst  case 

(After  1)  we  would  expect  a  chi  square  this  large  or  larger  about  15  percent  of 

the  time  when  the  model  is  correct.   Since  this  is  a  reasonably  likely  event, 

we  conclude  that  the  model  provides  an  excellent  fit  to  the  data  both  in  the 

relatively  stable  before  A.D.A.  market  and  in  the  transient  after  market.   We 

further  note  that  the  sample  size  in  all  cases  is  large  enough  for  the 

asymptotic  results  to  be  excellent  approximations. 


For  a  more  complete  discussion  of  this  point  and  certain  other  operational 
considerations,  see  Montgomery  [1966a,  Chapter  5]. 
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Table  6-2 
MINIMUM  CHI  SQUARE  RESULTS 


Segment 

Average 
Interpurchase 
Time 

Sample 
Size 

637 

^  mln 
3.263 

p-Level* 

Before  1 

0-30  Days 

0.95 

2 

31  -  45  Days 

556 

12.623 

0.19 

3 

46  -  60  Days 

480 

12.377 

0.20 

4 

Over  60  Days 

894 

5.111 

0.81 

After  1 

0-30  Days 

637 

13.661 

0.15 

2 

31  -  45  Days 

556 

5.523 

0.76 

3 

46  -  60  Days 

480 

3.999 

0.91 

4 

Over  60  Days 

894 

4.858 

0.83 

*p-Level  =  /     f(x  )  ^x^  where  x   is  chi  square  with  9  degrees  of  freeciom. 

X^  ■ 
^  rain 

The  reported  p-levels  were  interpolated  from  Table  A-3,  pg.  551,  of  Ferber 

and  Verdoorn,  Research  Methods  in  Economics  and  Business,  New  York: 

Macmillan  Company,  1962 
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VII.   SUMMARY 

The  purpose  of  this  paper  has  been  to  present  the  development  of  a  hetero- 
geneous, non-stationary  zero  order  response  model  and  to  report  upon  an  initial 
test  of   this  model  as  a  descriptor  of  consumer  brand  choice.   It  was  found 
that  the  model  is  empirically  viable  in  at  least  this  one  case.   Clearly,  the 
model  will  need  to  be  tested  in  many  more  market  situations  before  we  will  be 
willing  to  place  more  general  confidence  in  it.   In  addition,  it  is  important 
that  the  performance  of  this  model  be  compared  to  that  of  competing  models  in 
order  to  ascertain  which  model  provides  the  best  "fit"  to  consumer  brand  choice 
behavior. 

While  our  interest  in  this  paper  has  centered  upon  consumer  brand  choice 
behavior,  the  model  may  be  utilized  in  a  variety  of  situations.   In  particular, 
it  would  be  interesting  to  apply  the  model  to  voting  intentions  data  from  a 

multiwave  panel.   It  may  also  be  used  to  analyze  respondent  behavior  in 

44 
experimental  situations  in  psychology  and  sociology. 

Generalization  of  the  model  to  enable  it  to  handle  multi-alternative 

(greater  than  2)  response  situations  would  greatly  enhance  its  utility  in 

marketing.   This  generalization  as  well  as  further  empirical  testing  are 

important  issues  for  future  research. 


Coleman  [1964a]  reports  some  attempts  in  this  direction,  but  he  lacked  the 
procedure  to  test  the  model  and  the  results  in  the  infinite  element  case. 


APPENDIX 

The  purpose  of  this  appendix  is  to  illustrate  the  derivation  of  the  mean 
value  function  of  the  CEM.   Similar  derivations  may  be  made  for  the  variance 
of  the  CEM,  the  infinitesimal  mean  and  variance  of  the  CEM,  and  the  mean  value 
function  and  variance  of  the  lEM. 

Let  h  be  a  small  increment  of  time  such  that  no  more  than  one  transition 
may  occur  in  the  interval  from  t  to  t+h.   Then  by  the  axioms  M1-M4  the  only 
possible  values  of  X(t+h)-X(t)  and  their  associated  probabilities  of 
occurrence  are  those  given  in  Table  A-1. 

TABLE  A-1 

AX(t)  Conditional  on  X(t) 
Cohesive  Elements  Model 


AX(t)|X(t)  P[AX(t)|x(t)  =  |] 

X(t+h)-X(t)  =  1/N  X.h  =  (N-l)(a+iY)h 

X(t+h)-X(t)  =0  1  -  (A.+y.)h 

X(t+h)-X(t)  =  -1/N  M.h  =  i  h[B+(N-i)Y] 

The  expectation  of  X(t+h)-X(t)  may  be  expressed  as 
(A-1)     E[X(t+h)-X(t)]  =  E^[E^^.  ^^.^  [X(t+h)-X(t)|x(t)  =  ^]] 
But  notice  that  E   i  ,  ,  [X(t+h)-X(t) |x(t) ]  can  be  computed  from  the  results 

for  X(t+h)-X(t)  and  their  associated  probabilities  given  in  Table  A-1.   We  have 
(A-2)     E[X(t+h)-X(t) |x(t)  =  i/N]  =  ^  {(N-i)a  +(N-i)iY  -i6-(N-i)i  y} 

=  I  {(N-i)a-ie}  =  h  {(l-^)a^6} 

=  h  {a-(a+e)X(t)} 
Substituting  (A-2)  into  (A-1)  we  obtain 


(A-3)      E[X(t+h)-X(t)]  =  E^[E[X(t+h)-X(t) |x(t)  =  i/N] ] 

=  E^[h(a-(a+6)X(t)] 

=  h{a-(a+B)E[X(t)]} 
Dividing  (A-3)  by  h  and  taking  the  limit  as  h-^0  results  in 

(^_,)      li,  ^i^(^t'^-^^^)]  =  lim  E[X(t+h)]-E[X(t)] 
h-0      ^  h-0         ^ 

=  ^^^=a-(a+P)E[X(t)] 

The  first  equality  follows  from  the  fact  that  expectation  is  a  linear  operator 
while  the  second  equality  follows  from  the  definition  of  the  derivative  of  a 
function. 

The  solution  of  (A-4)  is  given  by 
(A-5)     E[X(t)]  =  E[X(0)]exp  {-(a+B)t}  +  [a/(a+6)][l  -  exp  {-(a+6)t}] 
where  E[X(0)]  expresses  the  iiitial  conditions.   For  an  individual  respondent 
starting  at  X(0) ,  the  conditional  expectation  of  X(t)  is  given  by 
(A-6)     E[X(t)|x(0)]  =  X(0)  exp  {-(a+3)t}  +  [a/(a+e)](l  -  exp{-(a+6) t} ] 


Page  38. 

Bibliography 

1.  Anderson,  T.  W.  and  L.  A.  Goodman  (1957),  "Statististical  Inference  About 
Markov  Chains",  Annals  of  Mathematical  Statistics,  Vol.  XXVIII, 

pp.  89-109 

2.  Atkinson,  R.  C. ,  G,  H.  Bower,  and  E.  J.  Crothers  f iges) .  An  Introduction 
to  Mathematical  Learning  Theory  (New  York:   Wiley) 

3.   and  W.  K.  Estes  (1963),  "Stimulus  Sampling  Theory",  Chapter  10  in 

Handbook  of  Mathematical  Psychology,  Vol.  II,  Luce,  Bush,  and  Galanter 
(eds.)  (New  York:   Wiley) 

4.  Bartlett,  M.  S.  (1962),  An  Introduction  to  Stochastic  Processes  (Cambridge: 
Cambridge  University  Press) 

5.  Bass,  F.  M,  et  al  (1961).  Mathematical  Models  and  Methods  in  Marketing 
(Homewood,  Illinois:   Richard  D.  Irwin,  Inc.) 

6.  Bharucha-Reid,  A.  T.  (1960),  Elements  of  the  Theory  of  Markov  Processes  and 
Their  Applications  (New  York:   McGraw  Hill) 

7.  Boyd,  H.  W. ,  Jr.  and  R.  L.  Westfall  (1960),  "An  Evaluation  of  Continuous 
Consumer  Panels  as  a  Source  of  Marketing  Information",  American  Marketing 
Association 

8.  Brown,  G.  (1952),  "Brand  Loyalty  ~  Fact  or  Fiction?",  Advertising  Age, 
Vol.  XXIII,  June  9,  June  30,  October  6,  and  December  1 

9.  Brownlee,  K.  A.  (1960) .  Statistical  Theory  and  Methodology  in  Science 
and  Engineering  (New  York:   Wiley) 

10.  Bush,  R.  R.  (1963),  "Estimation  and  Evaluation",  in  Handbook  of  Mathe- 
matical Psychology,  Vol.  I,  Luce,  Bush,  and  Galanter  (eds.)  (New  York: 
Wiley) ,  pp.  429-469 

11.   ,  F.  Mosteller,  and  G.  L.  Thompson  (1954),  "A  Formal  Structure  for 

Multiple  Choice  Decisions",  in  Decision  Processes.  Thrall,  Coombs,  and 
Davis  (eds.)  (New  York:   Wiley) 

12.   and  F,  Mosteller  (1955),  Stochastic  Models  for  Learning  (New  York: 

Wiley) 

13.  Carman,  J.  (1966),  "Brand  Switching  and  Linear  Learning  Models",  Journal 
of  Advertising  Research,  Vol.  VI,  No.  2  (June  1966),  pp.  23-31 

14.  Coleman,  J.  S.  (1963),  "The  Study  of  Consumer  Behavior  in  Repetitive 
Purchases  with  Consumer  Diary  Panels",  Paper  presented  to  the  Research 
Workshop  in  Marketing,  Graduate  School  of  Business  Administration, 
University  of  California,  Berkeley,  July  30,  1963  (ditto) 


Page  39. 

15.  Coleman,  J.    S.  (1964a),  Models  of  Change  and  Response  Uncertainty  (Engle- 
wood  Cliffs,  New  Jersey:   Prentice-Hall,  InCc) 

16.   (1964b),  Introduction  to  Mathematical  Sociology  (New  York:   The 

Free  Press  of  Glencoe 

17.  Copson,  E.  T.  (1935),  An  Introduction  to  the  Theory  of  Functions  of  a 
Complex  Variable  (London:   Oxford  University  Press) 

18.  Cramer,  H.  (1946) ,  Mathematical  Methods  of  Statistics  (Princeton: 
Princeton  University  Press) 

19.  Cunningham,  R.  M,  (1956),  "Brand  Loyalty  —  What,  Where,  How  Much?", 
Harvard  RnginPgQ  Rpvip^t,  Vol.  XXXIV,  No.  1,  pp.  116-128 

20.   (1961),  "Customer  Loyalty  to  Store  and  Brand",  Harvard  Business 

Review.  Vol.  XXXIX,  No.  6,  pp.  127-37 

21.  Demsetz,  H. ,  "The  Effect  of  Consumer  Experience  on  Brand  Loyalty  and  the 
Structure  of  Market  Demand",  Econometrica.  30  (1962),  pp.  22-33 

22.  Farley,  J,  U.  (1964),  "Why  Does  Brand  Loyalty  Vary  Over  Products?", 
■Tniirnal  nf  Markpfing  Rpsparrh,  Vol.  I,  No.  4,  pp.  9-14 

23.  Feller,  W.  (1950),  "Diffusion  Processes  in  Genetics",  in  Second  Berkeley 
Symposium  on  Mathematical  Statistics  and  Probability  (Berkeley:  Univer- 
sity of  California  Press,  pp.  227-246 

24.   (1957),  An  Introduction  to  Probability  Theory  and  Its  Application. 

Vol.  I,  Second  Edition  (New  York:   Wiley) 

25.  Ferber,  Re  and  P   Verdoorn,  Research  Methods  in  Business  and  Economics 
(New  York:   Macmillan,  1962) 

26.  Frank,  R,  E,  (1962),  "Brand  Choice  as  a  Probability  Process".  Journal  of 
Business.  Vol,  XXXV,  pp,  43-56 

27.   ,    A.  A.  Kuehn,  and  W.  F,  Massy  (1962),  Quantitative  Techniques  in 

Marketing  Analysis  (Homewood,  Illinois:   Richard  D.  Irwin) 

28.  Gani,  J.  (1961),  "On  the  Stochastic  Matrix  in  a  Genetic  Model  of  Mo ran" , 
Biometrika,  Vol.  XLVIII,  pp.  203-6 

29.  Goldberger,  A„  S.  (1964),  Econometric  Theory  (New  York:   Wiley) 

30.  Haines,  G.  H. ,  Jr.  (1964),  "A  Theory  of  Market  Behavior  After  Innovation", 
Management  Science.  Vol.  X,  No.  4,  pp.  634-658 

31.  Harary,  F.  and  B.  Lipstein  (1962),  "The  Dynamics  of  Brand  Loyalty:   A 
Markov  Approach",  Operations  Research.  Vol.  X,  No.  1,  pp.  19-40 

32.  Herniter,  J,  and  J.  Magee  (1961),  "Customer  Behavior  as  a  Markov  Process", 
Operations  Research  (January-February  1961),  pp.  105-22 


Page  40. 

33.  Hogg,  R.  V.  and  A,  T.  Craig  (1959),  Introduction  to  Mathematical  Statistics 
(New  York:   Macmillan  Company) 

34.  Howard,  R.  A,  (1963),  "Stochastic  Process  Models  of  Consumer  Behavior", 
Journal  of  Advertising  Research.  Vol.  Ill,  pp.  35-42 

35.   (1965),  "Dynamic  Inference",  Operations  Research,  September-October 

1965,  pp.  712-33 

36.  Johnston,  J.  (1963),  Econometric  Methods  (New  York:   McGraw-Hill) 

37.  Karlin,  S.  and  J=  McGregor  (1959),  "A  Characterization  of  Birth  and  Death 
Processes",  Proceedings  of  the  National  Academy  of  Sciences.,  Vol.  XLV, 
pp.  375-379 

38.  and  _______  "On  a  Genetics  Model  of  Moran",  Proceedings  of  the 

Cambridge  Philosophical  Society,  Vol.  LVIII,  pp.  299-311,  1962 

39.   and  (1964),  "On  Some  Stochastic  Models  in  Genetics",  in 

Stochastic  Models  m  Medicine  and  Biology.  J.  Garland  (ed.),  (Madison, 
Wisconsin:   University  of  Wisconsin  Press),  pp.  245-79 

40.  Kendall,  M.  G,  and  A.  Stuart  (1958),  The  Advanced  Theory  of  Statistics. 
Vol.  I  (New  York:   Hafner  Publishing  Company) 

41.   and  (1961),  The  Advanced  Theory  of  Statistics.  Vol.  II  (New 

York:   Hafner  Publishing  Company) 

42.  Kuehn,  A.  A.  (1958),  "An  Analysis  of  the  Dynamics  of  Consumer  Behavior 
and  Its  Implications  for  Marketing  Management",  Unpublished  Ph.D. 
Dissertation,  Graduate  School  of  Industrial  Administration,  Carnegie 
Institute  of  Technology 

43.   (1962),  "Consumer  Brand  Choice  —  A  Learning  Process?",  Journal  of 

Advertising  Research.  Vol.  II,  pp.  10-17 

44.  Lazarsfeld,  P.  (ed,)  (1954),  Mathematical  Thinking  in  the  Social  Sciences 
(Glencoe,  Illinois:   The  Free  Press) 

45.  Lipstem,  B.  (1959),  "The  Dynamics  of  Brand  Loyalty  and  Brand  Switching" 
in  Better  Measurements  of  Advertising  Effectiveness:   The  Challenge  of  the 
1960's,  Proceedings  of  the  Fifth  Annual  Conference  of  the  Advertising 
Research  Foundation,  New  York 

46.  Massy,  W.  F.  (1965),  "Estimation  of  Parameters  for  Linear  Learning  Models", 
Working  Paper  No,  78,  Graduate  School  of  Business,  Stanford  University, 
October  1965 

47.   (1966),  "Order  and  Homogeneity  of  Family  Specific  Brand-Switching 

Processes",  Journal  nf  Marketing  Research.  Vol.  Ill,  No.  1,  pp.  48-54 

48.   and  R.  E.  Frank  (1964),  "The  Study  of  Consumer  Purchase  Sequences 

Using  Factor  Analyses  and  Simulation",  Proceedings  of  the  Business  and 


Page  41. 

Economics  Section  of  the  American  Statistical  Association,  December  1964 

49.   ,  D.  B.  Montgomery,  and  D.  G.  Morrison  (1967),  Stochastic  Models  of 

Consumer  Behavior.  Manuscript  in  Process 

50.  Montgomery,  D.  B.  (1966a),  "A  Probability  Diffusion  Model  of  Dynamic 
Market  Behavior",  Working  Paper  No.  205-66,  Alfred  P.  Sloan  School  of 
Management,  Massachusetts  Institute  of  Technology,  May  1966 

51.   (1966b),  "On  the  Limiting  Form  of  the  Contagious  Binomial  Distribu- 
tion and  Its  Application  in  Stochastic  Models  of  Choice  Behavior", 
Working  Paper  209-66,  Alfred  P.  Sloan  School  of  Management,  Massachusetts 
Institute  of  Technology,  August  1966 

52.  and  G.  L.  Urban  (1967),  Management  Science  in  Marketing.  Manuscript 

in  Process,  Chapters  Available  as  Working  Papers  from  the  Sloan  School  of 
Management,  Massachusetts  Institute  of  Technology 

53.  Moran,  P.  A.  P.  (1958a),  "Random  Processes  in  Genetics",  Proceedings  of 
the  Cambridge  Philosophical  Society,  Vol.  LIV,  pp.  60-71 

54.   (1958b),  "The  Distribution  of  Gene  Frequency  in  a  Bisexual  Diploid 

Population",  Proceedings  of  the  Cambridge  Philosophical  Society, 

Vol.  LIV,  pp.  468-74 

55.  Morrison,  D.  G.  (1965a),  "New  Models  of  Consumer  Behavior:   Aids  in 
Setting  and  Evaluating  Marketing  Plans,  Proceedings:   Fall  Conference 
of  the  American  Marketing  Association,  September  1-3,  1965,  Washington, 
D.  C. 

56.  _____  (1965b),  "Stochastic  Models  for  Time  Series  with  Applications  in 
Marketing",  Technical  Report  No.  8,  Joint  Program  in  Operations  Research, 
Stanford  University 

57.   (1966a),  "Approximate  Methods  for  Calculating  the  Power  of  Certain 

Goodness-of-Fit  Tests",  Working  Paper  No.  89,  Graduate  School  of  Business, 
Stanford  University,  1966 

58.   (1966b),  "Interpurchase  Time  and  Brand  Loyalty",  .Tnnrnal  of  Marketing 

Research,  Vol.  Ill,  No.  3,  August  1966,  pp.  289-91 

59.  Moses,  Z.  E.  (1952),  "Non-Parametric  Statistics  for  Psychological  Research", 
Psychology  Bulletin,  49,  pp.  122-43 

60.  Neyman,  J.  (1949),  "Contribution  to  the  Theory  of  the  x^  Test",  in  First 
Berkeley  Symposium  on  Probability  and  Statistics  (Berkeley,  California: 
University  of  California  Press),  pp.  239-73 

61.  Parzen,  E.  (1960),  Modern  Probability  Theory  and  Its  Applications  (New 
York:   Wiley) 

62.  (1962) .  Stochastic  Processes  (San  Francisco:  Holden-Day,  Inc.) 


Page  42. 


63=   Prabhu,  N.  U.  (1965),  Stochastic  Processes  (New  York:   Macmillan) 

64.  Rao,  C.  R.  (1963),  "Criteria  of  Estimation  in  Large  Samples",  Sankhya.  25, 
pp.  189-206 

65.   (1965),  Linear  Statistical  Inference  and  Its  Applications  (New 

York:   Wiley) 

66.  Rice,  W.  T.  (1962),  "Measurement  of  Consumr  Loyalty:  Factor  Analysis  as 
a  Market  Research  Tool",  Unpublished  M.S.  Thesis  Submitted  to  the  Alfred 
P.  Sloan  School  of  Management,  Massachusetts  Institute  of  Technology 

67.  Raiffa,  H.  and  R.  Schlaifer  (L961) ,  Applied  Statistical  Decision  Theory 
(Boston:   Division  of   Research,  Graduate  School  of  Business  Administra- 
tion, Harvard  University) 

68.  Siegel,  Sidney  (1956),  Non-Parametric  Statistics  (New  York:   McGraw-Hill) 

69.  Styan,  G.  P.  II.  and  H.  Smith,  Jr.  (1964),  "Markov  Chains  Applied  to 
Marketing",  Journal  of  Marketing  Research,  Vol.  I,  pp.  50-55 

70.  Telser,  L.  G.  (1962),  "Estimating  the  Demand  for  Branded  Goods  from 
Consumer  Panel  Data",  Review  of  Economics  and  Statistics,  August  1962, 
pp.  300-24 

71.   (1963),  "Least  Square  Estimates  of  Transition  Probabilities",  in 

Measurement  in  Economics.  Christ  (ed.)  (Stanford,  California:   Stanford 
University  Press) 

72.  Wilks,  S.  S.  (1962),  Mathematical  Statistics  (New  York:   Wiley) 

73.  Yule,  G.  Udny,  and  M.  G.  Kendall  n  945^)3  An  Tntroduction  to  the  Theory  of 
Statistics  (London:   Griffin) 


\^ 


*^-  ^ «.,♦  r;  i^£  4- . .^ii  i~ 


Date  Due    ^ 


1^   I'l   .!'?& 


i2i'8* 


sec 


os\w 


Lib-26-67 


MIT  LIBRARIES 


3  T06D  003  T02  b7D 


3  =lDaO  003  TO 


:^  ^(^ 


an 


E  bbE 


3  TOflO  003  TOE  tiEl 


'^ 


3  T060  003  TOE  b05 


3  TOfiO  003  671  blO 


MIT  LIBRARIES 


3   TOfiO    0 


03 


■^    7? 


7^o 


7V/ 


fl71   bEfi 


24^^ 


3    TDfiO    003    fi71    bMM 


MIT   LieftAHlES 


3   TOfiO    0 


3    671    b77 


m^:k^4^^^m: 


